Do Characters Abuse More Than Words?

نویسندگان

  • Yashar Mehdad
  • Joel R. Tetreault
چکیده

Although word and character n-grams have been used as features in different NLP applications, no systematic comparison or analysis has shown the power of character-based features for detecting abusive language. In this study, we investigate the effectiveness of such features for abusive language detection in user-generated online comments, and show that such methods outperform previous state-of-theart approaches and other strong baselines.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Quote Attribution for Literary Text with Neural Networks

We propose a method for using neural networks to attribute quotes in literary texts. Since previous work has been unable to successfully solve this problem based on bag-of-words features, we study the issue of whether this is due to the limited expressiveness of such features. By re-framing the modeling of quotes and characters as based off of word vectors, we hope to demonstrate that individua...

متن کامل

A Structural Approach for Segmentation of Handwritten Hindi Text

This paper makes an attempt to segment the handwritten Hindi words. The problem of segmentation is compounded by the possible presence of modifiers (matras) on all sides of the basic characters and due to the uncertainty introduced in the character shapes by way of different writing styles. We have devised a structural approach to capture the similarities and differences between structure class...

متن کامل

Character Decomposition and Transposition Processes in Chinese Compound Words Modulates Attentional Blink

The attentional blink (AB) is the phenomenon in which the identification of the second of two targets (T2) is attenuated if it is presented less than 500 ms after the first target (T1). Although the AB is eliminated in canonical word conditions, it remains unclear whether the character order in compound words affects the magnitude of the AB. Morpheme decomposition and transposition of Chinese t...

متن کامل

Phonological Codes Constrain Output of Orthographic Codes via Sublexical and Lexical Routes in Chinese Written Production

To what extent do phonological codes constrain orthographic output in handwritten production? We investigated how phonological codes constrain the selection of orthographic codes via sublexical and lexical routes in Chinese written production. Participants wrote down picture names in a picture-naming task in Experiment 1or response words in a symbol-word associative writing task in Experiment 2...

متن کامل

Extension of Zipf's Law to Word and Character N-grams for English and Chinese

It is shown that for a large corpus, Zipf 's law for both words in English and characters in Chinese does not hold for all ranks. The frequency falls below the frequency predicted by Zipf's law for English words for rank greater than about 5,000 and for Chinese characters for rank greater than about 1,000. However, when single words or characters are combined together with n-gram words or chara...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016